Harvesting for Full-Text Retrieval

نویسندگان

  • Fabio Simeoni
  • Murat Yakici
  • Steve Neely
  • Fabio Crestani
چکیده

We propose an approach to Distributed Information Retrieval based on the periodic and incremental centralisation of full-text indices of widely dispersed and autonomously managed content sources. Inspired by the success of the Open Archive Initiative’s protocol for metadata harvesting, the approach occupies middle ground between: (i) the crawling of content, and (ii) the distribution of retrieval. As in crawling, some data moves towards the retrieval process, but it is statistics about the content rather than content itself. As in distributed retrieval, some processing is distributed along with the data, but it is indexing rather than retrieval itself. We show that the approach retains the good properties of centralised retrieval without renouncing to cost-effective resource pooling. We discuss the requirements associated with the approach and identify two strategies to deploy it on top of the OAI infrastructure.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Image retrieval using the combination of text-based and content-based algorithms

Image retrieval is an important research field which has received great attention in the last decades. In this paper, we present an approach for the image retrieval based on the combination of text-based and content-based features. For text-based features, keywords and for content-based features, color and texture features have been used. Query in this system contains some keywords and an input...

متن کامل

Using Text Surrounding Method to Enhance Retrieval of Online Images by Google Search Engine

Purpose: the current research aimed to compare the effectiveness of various tags and codes for retrieving images from the Google. Design/methodology: selected images with different characteristics in a registered domain were carefully studied. The exception was that special conceptual features have been apportioned for each group of images separately. In this regard, each group image surr...

متن کامل

Metadata harvesting for content-based distributed information retrieval

We propose an approach to content-based Distributed Information Retrieval based on the periodic and incremental centralization of full-content indices of widely dispersed and autonomously managed document sources. Inspired by the success of the Open Archive Initiative’s (OAI) Protocol for metadata harvesting, the approach occupies middle ground between content crawling and distributed retrieval...

متن کامل

Overview of the Full-Text Document Retrieval Benchmark

8.1 Introduction For most of recorded history, textual data have existed primarily in hardcopy format, and the related document retrieval process was essentially a manual task, possibly involving the assistance of cross-reference catalogs. By the mid-1960s, work was under way at the University of Pittsburgh to develop computer-assisted legal research systems [Harrington, 1984–85]. Also, during ...

متن کامل

The Study on Lucene Based IETM Information Retrieval

With the intensive and large scale application of IETM in equipment integrated support, information retrieval technology becomes one of the most key technologies. This article discusses the full-text search technology and Lucene full-text retrieval engine, and combines them to develop a highperformance scalable IETM full-text retrieval system, this system can effectively deal with IETM unstruct...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005